To demonstrate how Feasc were used to infer pathway activity, we prepared some public scRNA-seq datasets. Users could directly download the h5ad files.
import scanpy as sc
import numpy as np
import pandas as pd
import anndata as ad
import scipyadata = sc.read_10x_mtx('data/pbmc3k/', var_names='gene_symbols', cache=True)
cell_anno = pd.read_csv("data/pbmc3k/pbmc3k_seurat_annotation.txt", sep="\t", index_col=0)
adata.obs = adata.obs.join(cell_anno)
adata.obs| seurat_annotations | |
|---|---|
| AAACATACAACCAC-1 | Memory_CD4_T |
| AAACATTGAGCTAC-1 | B |
| AAACATTGATCAGC-1 | Memory_CD4_T |
| AAACCGTGCTTCCG-1 | CD14+_Mono |
| AAACCGTGTATGCG-1 | NK |
| … | … |
| TTTCGAACTCTCAT-1 | CD14+_Mono |
| TTTCTACTGAGGCA-1 | B |
| TTTCTACTTCCTCG-1 | B |
| TTTGCATGAGAGGC-1 | B |
| TTTGCATGCCTCAC-1 | Naive_CD4_T |
2700 rows × 1 columns
adata.write_h5ad('data/h5ad/pbmc3k.h5ad')we will load single-cell RNA sequencing data from the GSE96583 dataset, which contains immune cells under two conditions: with and without interferon (IFN) stimulation.
Mat = scipy.io.mmread("data/IFN/matrix.mtx")
X = Mat.T.toarray()
obs = pd.read_csv("data/IFN/GSE96583_batch2.total.tsne.df.tsv", sep="\t")
genef = pd.read_csv("data/IFN/GSE96583_batch2.genes.tsv", header=None, sep="\t")
var = pd.DataFrame(genef[1])
var = var.set_index(1).rename_axis('GeneSymbol')
adata = ad.AnnData(X, obs=obs, var=var, dtype='int32')
adata.obs = adata.obs.set_index('cell_id')
adata.var_names_make_unique()
adata.obs| tsne1 | tsne2 | ind | stim | cluster | cell | multiplets | |
|---|---|---|---|---|---|---|---|
| cell_id | |||||||
| AAACATACAATGCC-1 | -4.277833 | -19.294709 | 107 | ctrl | 5 | CD4 T cells | doublet |
| AAACATACATTTCC-1 | -27.640373 | 14.966629 | 1016 | ctrl | 9 | CD14+ Monocytes | singlet |
| AAACATACCAGAAA-1 | -27.493646 | 28.924885 | 1256 | ctrl | 9 | CD14+ Monocytes | singlet |
| AAACATACCAGCTA-1 | -28.132584 | 24.925484 | 1256 | ctrl | 9 | CD14+ Monocytes | doublet |
| AAACATACCATGCA-1 | -10.468194 | -5.984389 | 1488 | ctrl | 3 | CD4 T cells | singlet |
| … | … | … | … | … | … | … | … |
| TTTGCATGCTAAGC-1 | 25.142392 | 6.603815 | 107 | stim | 6 | CD4 T cells | singlet |
| TTTGCATGGGACGA-1 | 14.359657 | 10.965601 | 1488 | stim | 6 | CD4 T cells | singlet |
| TTTGCATGGTGAGG-1 | 27.317997 | 7.933458 | 1488 | stim | 6 | CD4 T cells | ambs |
| TTTGCATGGTTTGG-1 | 13.744084 | 9.347784 | 1244 | stim | 6 | CD4 T cells | ambs |
| TTTGCATGTCTTAC-1 | 14.572118 | -4.713942 | 1016 | stim | 5 | CD4 T cells | singlet |
29065 rows × 7 columns
adata.write_h5ad('data/h5ad/GSE96583_IFN.h5ad')We will also use the GSE154109 dataset, which contains scRNA-seq data from acute myeloid leukemia (AML) patients, to demonstrate how dimension reduction enhances cytokine activity inference. Cell type annotations were obtained from the TISCH2 database.
adata = sc.read_h5ad("data/AML_GSE154109/AML_GSE154109.h5ad")
obs = adata.obs
obs.index = obs.index.str.replace('@', '_')
obs| UMAP_1 | UMAP_2 | Cluster | Celltype (malignancy) | Celltype (major-lineage) | Celltype (minor-lineage) | Patient | Sample | Tissue | |
|---|---|---|---|---|---|---|---|---|---|
| Cell | |||||||||
| P1_AAATGCCAGACTAGAT-1 | 12.036471 | -3.891038 | 14 | Immune cells | B | B | P1 | AML1 | Tumor |
| P1_AAGGTTCTCAACGGGA-1 | 11.871773 | -4.192421 | 14 | Immune cells | B | B | P1 | AML1 | Tumor |
| P1_ACACCAAGTACCGGCT-1 | 11.403178 | -3.530503 | 14 | Immune cells | B | B | P1 | AML1 | Tumor |
| P1_ACATCAGGTTTAAGCC-1 | 11.493506 | -3.593977 | 14 | Immune cells | B | B | P1 | AML1 | Tumor |
| P1_ACTTGTTGTGGCGAAT-1 | 11.426346 | -3.142435 | 14 | Immune cells | B | B | P1 | AML1 | Tumor |
| … | … | … | … | … | … | … | … | … | … |
| P8_TTAGGCAAGTACGATA-1 | -3.451650 | 3.087100 | 12 | Immune cells | Mono/Macro | cDC1 | P8 | AML8 | Tumor |
| P8_TTCGGTCGTTCAGACT-1 | -3.653885 | 2.907512 | 12 | Immune cells | Mono/Macro | cDC1 | P8 | AML8 | Tumor |
| P8_TTCTCAACAAGCGCTC-1 | -3.589192 | 2.918538 | 12 | Immune cells | Mono/Macro | cDC1 | P8 | AML8 | Tumor |
| P8_TTCTCAAGTGTGACCC-1 | -2.689308 | 3.016534 | 12 | Immune cells | Mono/Macro | cDC1 | P8 | AML8 | Tumor |
| P8_TTGACTTCATGGATGG-1 | -2.064048 | 2.886846 | 12 | Immune cells | Mono/Macro | cDC1 | P8 | AML8 | Tumor |
10799 rows × 9 columns